{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Engineering Dates\n",
    "\n",
    "Date variables are special type of categorical variable. By their own nature, date variables will contain a multitude of different labels, each one corresponding to a specific date and sometimes time. Date variables, when preprocessed properly can highly enrich a dataset. For example, from a date variable we can extract:\n",
    "\n",
    "- Month\n",
    "- Quarter\n",
    "- Semester\n",
    "- Day (number)\n",
    "- Day of the week\n",
    "- Is Weekend?\n",
    "- Hr\n",
    "- Time differences in years, months, days, hrs, etc.\n",
    "\n",
    "It is important to understand that date variables should not be used as the categorical variables we have been working so far when building a machine learning model. Not only because they have a multitude of categories, but also because when we actually use the model to score a new observation, this observation will most likely be in the future, an therefore its date label, will be different than the ones contained in the training set and therefore the ones used to train the machine learning algorithm.\n",
    "\n",
    "=============================================================================\n",
    "\n",
    "Let's look at how to pre-process date variables in a real life example.\n",
    "\n",
    "### Lending Club\n",
    "\n",
    "**Lending Club** is a peer-to-peer Lending company based in the US. They match people looking to invest money with people looking to borrow money. When investors invest their money through Lending Club, this money is passed onto borrowers, and when borrowers pay their loans back, the capital plus the interest passes on back to the investors. It is a win for everybody as they can get typically lower loan rates and higher investor returns.\n",
    "\n",
    "If you want to learn more about Lending Club follow this link:\n",
    "https://www.lendingclub.com/\n",
    "\n",
    "The Lending Club dataset contains complete loan data for all loans issued through the 2007-2015, including the current loan status (Current, Late, Fully Paid, etc.) and latest payment information. Features (aka variables) include credit scores, number of finance inquiries, address including zip codes and state, and collections among others. Collections indicates whether the customer has missed one or more payments and the team is trying to recover their money.\n",
    "The file is a matrix of about 890 thousand observations and 75 variables. More detail on this dataset can be found in Kaggle's website: https://www.kaggle.com/wendykan/lending-club-loan-data\n",
    "\n",
    "Let's go ahead and have a look!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "import datetime"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>issue_d</th>\n",
       "      <th>last_pymnt_d</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Dec-2011</td>\n",
       "      <td>Jan-2015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dec-2011</td>\n",
       "      <td>Apr-2013</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Dec-2011</td>\n",
       "      <td>Jun-2014</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Dec-2011</td>\n",
       "      <td>Jan-2015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Dec-2011</td>\n",
       "      <td>Jan-2016</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    issue_d last_pymnt_d\n",
       "0  Dec-2011     Jan-2015\n",
       "1  Dec-2011     Apr-2013\n",
       "2  Dec-2011     Jun-2014\n",
       "3  Dec-2011     Jan-2015\n",
       "4  Dec-2011     Jan-2016"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# let's load the Lending Club dataset with a few selected columns\n",
    "# just a few rows to speed things up\n",
    "\n",
    "use_cols = ['issue_d', 'last_pymnt_d']\n",
    "data = pd.read_csv('loan.csv', usecols=use_cols, nrows=10000)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>issue_d</th>\n",
       "      <th>issue_dt</th>\n",
       "      <th>last_pymnt_d</th>\n",
       "      <th>last_pymnt_dt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Dec-2011</td>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>Jan-2015</td>\n",
       "      <td>2015-01-01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dec-2011</td>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>Apr-2013</td>\n",
       "      <td>2013-04-01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Dec-2011</td>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>Jun-2014</td>\n",
       "      <td>2014-06-01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Dec-2011</td>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>Jan-2015</td>\n",
       "      <td>2015-01-01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Dec-2011</td>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>Jan-2016</td>\n",
       "      <td>2016-01-01</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    issue_d   issue_dt last_pymnt_d last_pymnt_dt\n",
       "0  Dec-2011 2011-12-01     Jan-2015    2015-01-01\n",
       "1  Dec-2011 2011-12-01     Apr-2013    2013-04-01\n",
       "2  Dec-2011 2011-12-01     Jun-2014    2014-06-01\n",
       "3  Dec-2011 2011-12-01     Jan-2015    2015-01-01\n",
       "4  Dec-2011 2011-12-01     Jan-2016    2016-01-01"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# now let's parse the dates, currently coded as strings, into datetime format\n",
    "\n",
    "data['issue_dt'] = pd.to_datetime(data.issue_d)\n",
    "data['last_pymnt_dt'] = pd.to_datetime(data.last_pymnt_d)\n",
    "\n",
    "data[['issue_d','issue_dt','last_pymnt_d', 'last_pymnt_dt']].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>issue_dt</th>\n",
       "      <th>issue_dt_month</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    issue_dt  issue_dt_month\n",
       "0 2011-12-01              12\n",
       "1 2011-12-01              12\n",
       "2 2011-12-01              12\n",
       "3 2011-12-01              12\n",
       "4 2011-12-01              12"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Extracting Month from date\n",
    "\n",
    "data['issue_dt_month'] = data['issue_dt'].dt.month\n",
    "\n",
    "data[['issue_dt', 'issue_dt_month']].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>issue_dt</th>\n",
       "      <th>issue_dt_month</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>9995</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9996</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9997</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9998</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9999</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       issue_dt  issue_dt_month\n",
       "9995 2011-08-01               8\n",
       "9996 2011-08-01               8\n",
       "9997 2011-08-01               8\n",
       "9998 2011-08-01               8\n",
       "9999 2011-08-01               8"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[['issue_dt', 'issue_dt_month']].tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>issue_dt</th>\n",
       "      <th>issue_dt_quarter</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    issue_dt  issue_dt_quarter\n",
       "0 2011-12-01                 4\n",
       "1 2011-12-01                 4\n",
       "2 2011-12-01                 4\n",
       "3 2011-12-01                 4\n",
       "4 2011-12-01                 4"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Extract quarter from date variable\n",
    "\n",
    "data['issue_dt_quarter'] = data['issue_dt'].dt.quarter\n",
    "\n",
    "data[['issue_dt', 'issue_dt_quarter']].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>issue_dt</th>\n",
       "      <th>issue_dt_quarter</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>9995</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9996</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9997</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9998</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9999</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       issue_dt  issue_dt_quarter\n",
       "9995 2011-08-01                 3\n",
       "9996 2011-08-01                 3\n",
       "9997 2011-08-01                 3\n",
       "9998 2011-08-01                 3\n",
       "9999 2011-08-01                 3"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[['issue_dt', 'issue_dt_quarter']].tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>issue_d</th>\n",
       "      <th>last_pymnt_d</th>\n",
       "      <th>issue_dt</th>\n",
       "      <th>last_pymnt_dt</th>\n",
       "      <th>issue_dt_month</th>\n",
       "      <th>issue_dt_quarter</th>\n",
       "      <th>issue_dt_semester</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Dec-2011</td>\n",
       "      <td>Jan-2015</td>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>2015-01-01</td>\n",
       "      <td>12</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Dec-2011</td>\n",
       "      <td>Apr-2013</td>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>2013-04-01</td>\n",
       "      <td>12</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Dec-2011</td>\n",
       "      <td>Jun-2014</td>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>2014-06-01</td>\n",
       "      <td>12</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Dec-2011</td>\n",
       "      <td>Jan-2015</td>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>2015-01-01</td>\n",
       "      <td>12</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Dec-2011</td>\n",
       "      <td>Jan-2016</td>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>2016-01-01</td>\n",
       "      <td>12</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    issue_d last_pymnt_d   issue_dt last_pymnt_dt  issue_dt_month  \\\n",
       "0  Dec-2011     Jan-2015 2011-12-01    2015-01-01              12   \n",
       "1  Dec-2011     Apr-2013 2011-12-01    2013-04-01              12   \n",
       "2  Dec-2011     Jun-2014 2011-12-01    2014-06-01              12   \n",
       "3  Dec-2011     Jan-2015 2011-12-01    2015-01-01              12   \n",
       "4  Dec-2011     Jan-2016 2011-12-01    2016-01-01              12   \n",
       "\n",
       "   issue_dt_quarter  issue_dt_semester  \n",
       "0                 4                  2  \n",
       "1                 4                  2  \n",
       "2                 4                  2  \n",
       "3                 4                  2  \n",
       "4                 4                  2  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# We could also extract semester\n",
    "\n",
    "data['issue_dt_semester'] = np.where(data.issue_dt_quarter.isin([1,2]),1,2)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>issue_dt</th>\n",
       "      <th>issue_dt_day</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    issue_dt  issue_dt_day\n",
       "0 2011-12-01             1\n",
       "1 2011-12-01             1\n",
       "2 2011-12-01             1\n",
       "3 2011-12-01             1\n",
       "4 2011-12-01             1"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# day - numeric from 1-31\n",
    "\n",
    "data['issue_dt_day'] = data['issue_dt'].dt.day\n",
    "\n",
    "data[['issue_dt', 'issue_dt_day']].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>issue_dt</th>\n",
       "      <th>issue_dt_dayofweek</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    issue_dt  issue_dt_dayofweek\n",
       "0 2011-12-01                   3\n",
       "1 2011-12-01                   3\n",
       "2 2011-12-01                   3\n",
       "3 2011-12-01                   3\n",
       "4 2011-12-01                   3"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# day of the week - from 0 to 6\n",
    "\n",
    "data['issue_dt_dayofweek'] = data['issue_dt'].dt.dayofweek\n",
    "\n",
    "data[['issue_dt', 'issue_dt_dayofweek']].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>issue_dt</th>\n",
       "      <th>issue_dt_dayofweek</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>9995</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9996</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9997</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9998</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9999</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       issue_dt  issue_dt_dayofweek\n",
       "9995 2011-08-01                   0\n",
       "9996 2011-08-01                   0\n",
       "9997 2011-08-01                   0\n",
       "9998 2011-08-01                   0\n",
       "9999 2011-08-01                   0"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[['issue_dt', 'issue_dt_dayofweek']].tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>issue_dt</th>\n",
       "      <th>issue_dt_dayofweek</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>Thursday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>Thursday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>Thursday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>Thursday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>Thursday</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    issue_dt issue_dt_dayofweek\n",
       "0 2011-12-01           Thursday\n",
       "1 2011-12-01           Thursday\n",
       "2 2011-12-01           Thursday\n",
       "3 2011-12-01           Thursday\n",
       "4 2011-12-01           Thursday"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# day of the week - name\n",
    "\n",
    "data['issue_dt_dayofweek'] = data['issue_dt'].dt.weekday_name\n",
    "\n",
    "data[['issue_dt', 'issue_dt_dayofweek']].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>issue_dt</th>\n",
       "      <th>issue_dt_dayofweek</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>9995</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>Monday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9996</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>Monday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9997</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>Monday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9998</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>Monday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9999</th>\n",
       "      <td>2011-08-01</td>\n",
       "      <td>Monday</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       issue_dt issue_dt_dayofweek\n",
       "9995 2011-08-01             Monday\n",
       "9996 2011-08-01             Monday\n",
       "9997 2011-08-01             Monday\n",
       "9998 2011-08-01             Monday\n",
       "9999 2011-08-01             Monday"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[['issue_dt', 'issue_dt_dayofweek']].tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>issue_dt</th>\n",
       "      <th>issue_dt_dayofweek</th>\n",
       "      <th>issue_dt_is_weekend</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>Thursday</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>Thursday</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>Thursday</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>Thursday</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>Thursday</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    issue_dt issue_dt_dayofweek  issue_dt_is_weekend\n",
       "0 2011-12-01           Thursday                    0\n",
       "1 2011-12-01           Thursday                    0\n",
       "2 2011-12-01           Thursday                    0\n",
       "3 2011-12-01           Thursday                    0\n",
       "4 2011-12-01           Thursday                    0"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# was the application done on the weekend?\n",
    "\n",
    "data['issue_dt_is_weekend'] = np.where(data['issue_dt_dayofweek'].isin(['Sunday', 'Saturday']), 1,0)\n",
    "data[['issue_dt', 'issue_dt_dayofweek','issue_dt_is_weekend']].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>issue_dt</th>\n",
       "      <th>issue_dt_dayofweek</th>\n",
       "      <th>issue_dt_is_weekend</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>4269</th>\n",
       "      <td>2011-10-01</td>\n",
       "      <td>Saturday</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4280</th>\n",
       "      <td>2011-10-01</td>\n",
       "      <td>Saturday</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4287</th>\n",
       "      <td>2011-10-01</td>\n",
       "      <td>Saturday</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4290</th>\n",
       "      <td>2011-10-01</td>\n",
       "      <td>Saturday</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4293</th>\n",
       "      <td>2011-10-01</td>\n",
       "      <td>Saturday</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       issue_dt issue_dt_dayofweek  issue_dt_is_weekend\n",
       "4269 2011-10-01           Saturday                    1\n",
       "4280 2011-10-01           Saturday                    1\n",
       "4287 2011-10-01           Saturday                    1\n",
       "4290 2011-10-01           Saturday                    1\n",
       "4293 2011-10-01           Saturday                    1"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[data.issue_dt_is_weekend==1][['issue_dt', 'issue_dt_dayofweek','issue_dt_is_weekend']].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>issue_dt</th>\n",
       "      <th>issue_dt_year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>2011</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>2011</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>2011</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>2011</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2011-12-01</td>\n",
       "      <td>2011</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    issue_dt  issue_dt_year\n",
       "0 2011-12-01           2011\n",
       "1 2011-12-01           2011\n",
       "2 2011-12-01           2011\n",
       "3 2011-12-01           2011\n",
       "4 2011-12-01           2011"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# extract year \n",
    "\n",
    "data['issue_dt_year'] = data['issue_dt'].dt.year\n",
    "\n",
    "data[['issue_dt', 'issue_dt_year']].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      -1127 days\n",
       "1       -487 days\n",
       "2       -913 days\n",
       "3      -1127 days\n",
       "4      -1492 days\n",
       "5      -1127 days\n",
       "6      -1492 days\n",
       "7      -1127 days\n",
       "8       -122 days\n",
       "9       -336 days\n",
       "10      -548 days\n",
       "11      -640 days\n",
       "12      -213 days\n",
       "13     -1127 days\n",
       "14      -670 days\n",
       "15     -1127 days\n",
       "16     -1127 days\n",
       "17      -517 days\n",
       "18     -1158 days\n",
       "19      -213 days\n",
       "20     -1339 days\n",
       "21      -640 days\n",
       "22     -1127 days\n",
       "23      -670 days\n",
       "24      -305 days\n",
       "25      -275 days\n",
       "26            NaT\n",
       "27      -366 days\n",
       "28      -487 days\n",
       "29     -1096 days\n",
       "          ...    \n",
       "9970   -1096 days\n",
       "9971   -1188 days\n",
       "9972    -519 days\n",
       "9973    -274 days\n",
       "9974   -1096 days\n",
       "9975   -1004 days\n",
       "9976    -274 days\n",
       "9977   -1096 days\n",
       "9978   -1096 days\n",
       "9979    -762 days\n",
       "9980   -1492 days\n",
       "9981    -397 days\n",
       "9982   -1004 days\n",
       "9983     -31 days\n",
       "9984    -974 days\n",
       "9985   -1096 days\n",
       "9986   -1096 days\n",
       "9987    -731 days\n",
       "9988    -853 days\n",
       "9989    -915 days\n",
       "9990    -244 days\n",
       "9991   -1096 days\n",
       "9992   -1096 days\n",
       "9993    -458 days\n",
       "9994   -1614 days\n",
       "9995    -974 days\n",
       "9996   -1522 days\n",
       "9997   -1096 days\n",
       "9998   -1096 days\n",
       "9999    -153 days\n",
       "Length: 10000, dtype: timedelta64[ns]"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# perhaps more interestingly, extract the date difference between 2 dates\n",
    "\n",
    "data['issue_dt'] - data['last_pymnt_dt']\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1127.0\n",
       "1     487.0\n",
       "2     913.0\n",
       "3    1127.0\n",
       "4    1492.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# same as above capturing just the time difference\n",
    "(data['last_pymnt_dt']-data['issue_dt']).dt.days.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0   2215 days 17:19:22.155930\n",
       "1   2215 days 17:19:22.155930\n",
       "2   2215 days 17:19:22.155930\n",
       "3   2215 days 17:19:22.155930\n",
       "4   2215 days 17:19:22.155930\n",
       "Name: issue_dt, dtype: timedelta64[ns]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# or the time difference to today, or any other day of reference\n",
    "\n",
    "(datetime.datetime.today() - data['issue_dt']).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "9995   2337 days 17:19:22.173943\n",
       "9996   2337 days 17:19:22.173943\n",
       "9997   2337 days 17:19:22.173943\n",
       "9998   2337 days 17:19:22.173943\n",
       "9999   2337 days 17:19:22.173943\n",
       "Name: issue_dt, dtype: timedelta64[ns]"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(datetime.datetime.today() - data['issue_dt']).tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Similarly, if this variable had time as well, we could calculate, at what time the application was done in hrs for example, and then segregate it in: morning-afternoon-evening.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Very common date differences used in the industry include \"age\" using \"date of birth\" and the \"time of application\" (i.e., how old was the applicant when they applied for a loan). Other examples include how long the applicant has lived at th declared address.\n",
    "\n",
    "### Note\n",
    "\n",
    "Once preprocessed in these ways, the variables are ready to be used in machine learning models following typical preprocessing of numerical or categorical variables, as shown in previous and future sections in this course."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  },
  "toc": {
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": "block",
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
